Skip to content

Add support for CUDA >= 12.9 #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 30, 2025
Merged

Add support for CUDA >= 12.9 #757

merged 17 commits into from
Jul 30, 2025

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Jul 7, 2025

Recent CUDA versions don't support non-context NPP calls, so use the ctx-based API calls.
Also CUDA 12.9+ deprecates nppGetStreamContext, so we need to build the NPP context manually.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 7, 2025
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for sending this PR @Kh4L, this is very helpful.

I applied our linter and also enabled testing for CUDA 12.9 so that we can correctly check the PR.

Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?

@Kh4L
Copy link
Contributor Author

Kh4L commented Jul 10, 2025

@NicolasHug LMK if you need anything else!

the assertion error doesn't seem related to my change

@NicolasHug
Copy link
Member

Nothing else to do on your side @Kh4L , thank you. I'll merge this soon, I'll just try to extract all the #ifdef stuff into their own single function before merging.

@NicolasHug NicolasHug changed the title Adapt NPP calls for CUDA >= 12.9 Add support for CUDA >= 12.9 Jul 30, 2025
@NicolasHug
Copy link
Member

Let me try to summarize the changes and add context for other reviewers, and for future reference.

From their release notes, CUDA 12.9 deprecates:

Removing the non-context APIs makes sense to me. I don't understand the logic behind removing nppGetStreamContext(). And I'm not the only one. But OK.

We now have to manually create the NppContext, which is the bullk of this PR. Note that opencv is doing something very similar as well: https://github.com/opencv/opencv/pull/27288/files.

// NppStreamContext hStream and nStreamFlags should not be part of the cache
// because they may change across calls.
NppStreamContext nppCtx = createNppStreamContext(
static_cast<int>(getFFMPEGCompatibleDeviceIndex(device_)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note on the cache: I originally implemented the "cache" as a simple nppCtx_ attribute on the CudaDeviceInterface class: 565896e (#757)

But I don't think that would be correct: the CudaDeviceInterface instance is global, and we only have one single instance for all CUDA devices. And we can't use one single NppContext for all CUDA devices - we need one NppContext per device.

So, we need a per-device cache for the NppContext, similar to our existing hw_device_ctx cache. I'm leaving that for an immediate follow-up.

@@ -265,37 +303,37 @@ void CudaDeviceInterface::convertAVFrameToFrameOutput(
dst = allocateEmptyHWCTensor(height, width, device_);
}

// Use the user-requested GPU for running the NPP kernel.
c10::cuda::CUDAGuard deviceGuard(device_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard isn't needed anymore as we now explicitly pass the current device to the NppContext creation.

at::cuda::CUDAStream nppStreamWrapper =
c10::cuda::getStreamFromExternal(nppGetStream(), device_.index());
nppDoneEvent.record(nppStreamWrapper);
nppDoneEvent.block(at::cuda::getCurrentCUDAStream());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These syncs aren't needed anymore because we now explicitly ask Npp to rely on pytorch's current stream.

@NicolasHug NicolasHug merged commit ee42162 into pytorch:main Jul 30, 2025
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants